Whole exome sequencing of high-risk neuroblastoma identifies novel non-synonymous variants

Neuroblastoma (NBL), one of the main death-causing cancers in children, is known for its remarkable genetic heterogeneity and varied patient outcome spanning from spontaneous regression to widespread disease. Specific copy number variations and single gene rearrangements have been proven to be associated with biological behavior and prognosis; however, there is still an unmet need to enlarge the existing armamentarium of prognostic and therapeutic targets. We performed whole exome sequencing (WES) of samples from 18 primary tumors and six relapse samples originating from 18 NBL patients. Our cohort consists of 16 high-risk, one intermediate, and one very low risk patient. The obtained results confirmed known mutational hotspots in ALK and revealed other non-synonymous variants of NBL-related genes (TP53, DMD, ROS, LMO3, PRUNE2, ERBB3, and PHOX2B) and of genes cardinal for other cancers (KRAS, PIK3CA, and FLT3). Beyond, GOSeq analysis determined genes involved in biological adhesion, neurological cell-cell adhesion, JNK cascade, and immune response of cell surface signaling pathways. We were able to identify novel coding variants present in more than one patient in nine biologically relevant genes for NBL, including TMEM14B, TTN, FLG, RHBG, SHROOM3, UTRN, HLA-DRB1, OR6C68, and XIRP2. Our results may provide novel information about genes and signaling pathways relevant for the pathogenesis and clinical course in high-risk NBL.


Introduction
Neuroblastoma (NBL) presents a major challenge in paediatric oncology due to its highly variable clinical appearance and extreme genetic heterogeneity [1][2][3]. It is the most common extra-cranial solid tumor in children, originating from immature precursors of sympathic performed. Variants of genes known to be important for NBL development and aggressiveness were detected with a mean number of non-synonymous variants of 28 (range . Hotspot mutations in the ALK gene were identified in tumor samples from five of these high-risk NBL patients. Mutations in genes previously reported playing a role in NBL disease development were found, including PHOX2B, TP53, DMD, ROS, LMO3, PRUNE2, and ERBB3, members of the MAPK family and ABCA2 genes. Our patient samples revealed few recurrent mutations; however, in addition to the ALK gene, variants in nine biologically relevant genes were identified being mutated in more than one patient i.e. TMEM14B, TTN, FLG, RHBG, SHROOM3, UTRN, HLA-DRB1, OR6C68, and XIRP2. Some potentially important genes were detected by pathway analysis and we hypothesize that further studies of their functional role in the origin and progression of NBL could lead to the discovery of new potential biomarkers.

Results
A total of 16 primary and four relapse tumor samples of patients diagnosed with high-risk NBL were investigated by WES. Two additional samples of patients not fulfilling the clinical INRG high-risk disease criteria were also included in the study: patient 7 was diagnosed with a localized tumor without MYCN amplification but with an 11q deletion (intermediate risk) and experienced two relapses at different time points, and patient 15 was classified as stage MS with very low risk (no MNA, no 11q deletion), but displayed two segmental aberrations (+2p and +17q). For patient 23, a sample collected at time of diagnosis was not available. All tissue samples were taken prior to therapy, except the primary tumor of patient 14, which was collected after initial chemotherapy treatment had started. The median age at time of diagnosis was 44 months (range 1. 5-192 months). Five patients were under the age of 18 months at the time of diagnosis and one of them was MYCN amplified. Among 18 patients at the age of �18 months, four were MYCN amplified and eight showed an 11q deletion (Table 1). Eight patients relapsed between 10 months and 3 years after diagnosis; seven patients died from the disease. They all belonged to the group of �18 months.

Variants detected by WES
The total number of detected variants in the primary tumor samples revealed extreme variations between individual patients, ranging from 12 variants in patient 2 to 1687 variants detected in patient 12. The average number of variants for patients of �18 months is 237, while for those below 18 months was 14 variants. The mean and median numbers of total variants were 190 and 54, respectively. The mean and median numbers of all detected non-synonymous variants were 28 and 5, ranging from 1 to 346. The total numbers of variants detected in both, primary and relapse tumor samples, were classified into different Tiers (Fig 1A and 1B and S1 and S2 Tables). For the relapsed samples, the mean and median numbers of all detected non-synonymous variants were 66 and 27, ranging from 1 to 424. Numbers of variants detected in relapse samples were higher compared to the corresponding primary tumor samples (S3 Table). The average tumor cell percentage in all samples was 79 (60-90) (S6 Table). For all detected variants, the tumor allele frequency (TAF) is provided (S7 Table).
No Tier 1 variants with strong clinical significance for the user-specified cancer type have been identified in our study. Tier 2 variants were found in primary and relapse samples and included six variants in four genes, three in ALK, one in the FLT3, one in KRAS, and one in PIK3CA ( Table 2). The PIK3CA gene was not mutated in the primary tumor of patient 17. For patient 23, no material from the primary tumor was available.
Based on the Personal Cancer Genome Reporter (PCGR) report, we were able to identify ALK mutations in five patients. Three were classified as being clinically significant (Table 2),  (Table 3A and 3B). Several of these variants have previously been reported to play a potential role in NBL disease development, including TP53, DMD, ROS, LMO3, PRUNE2, and ERBB3, members of the MAPK family (MAP2K4 and MAP2K7) and ABCA2 genes. Additionally, two variants in the PHOX2B gene were found (classified as Tier 4): one intron variant in patient 12 and one missense variant in patient 6. To evaluate the cancer-associations of certain detected variants, the MutationAssessor predictor algorithm was applied. This sequence-based tool uses the impact of mutations to rank genes according to their significance for cancer. Taking into account the functional impact of amino-acid substitutions on proteins, it gives a functional impact score (FIS) for every nonsynonymous mutation. If the FIS is >2.00, the mutation is considered to have a damaging effect [38,39]. The PHLPP1 gene was mutated in two of the patients with the prediction of a damaged protein variant c.4856G>A in patient 9 and tolerated c.2791G>A variant in patient 12. Beside the observed PHLPP1 mutations, additional genes related to the RET signaling pathway were detected and classified into Tier 3, both in primary and relapse tumor samples :  ERBB3, MET, PDGFB, RET, IRS2, DUSP10, AKT1, RIT1, MAPK7, NFKB1, MAP2K7,  MAP2K4, MEF2C, and MMP9.
Variants classified into Tier 3 were grouped into mutations detected only in the primary tumor, shared by the primary and relapse tumor, or unique for relapse samples. Only patient 1 revealed shared variants in nine genes, with TAFs ranging from 14 to 83% (S7 Table); however, additional 20 and 31 gene variants were found to be unique for the primary and relapse, respectively ( Table 4). The other three patients with both the primary and relapse tumor samples did not share gene variants classified into Tier 3. We observed shared mutated variants of DUSP10, NR1H4, and MED12 genes between two relapses of patient 7 (Table 5). RET gene variants were detected in two of the relapse samples, patient 1 and 17; however, at different positions (c.2081G>A; c.395T>C).
Patient samples 2, 6, 8, and 15 exhibited a general lower mutation burden of 12, 96, 12, and 24 detected variants; and neither Tier 2 nor 3 variants were detected. Oncoscore was applied to investigate further potential targets in Tier 4 classified variants in these patients. In patient 2, a total of 12 mutations were found, whereof six were classified as Tier 4 with the highest reported Oncoscore value at~0.26 in a non-synonymous variant of the NCKAP5 gene, predicted to cause alterations of the resulting protein. All other identified non-synonymous mutations exhibited an Oncoscore below~0.27. In patient 6, 31 out of a total of 96 mutations were classified as Tier 4. A non-synonymous variant was detected for the TNIK gene, predicted to produce a damaged protein, revealing an Oncoscore of~0.624. Additionally, mutated variants of PHOX2B gene were detected with Oncoscore~0.26 (p.Glu129Ter c.385G>T). Unfortunately, MutationAssessor does not provide prediction assessment for this variant. In the tumor sample of patient 8, a total of 12 mutations were determined, of which three were classified into

Patient ID Tumor suppressors Protooncogenes
Tier 4. A non-synonymous variant of the PSMC3 gene was found and reported as predicted to produce a malformed protein but with a low Oncoscore of~0.27. In patient 15, 24 mutations were detected, seven of them being classified as Tier 4. One variant found in the AHNAK2 gene was previously reported as being destructive resulting in a changed protein with an Oncoscore of~0.60 (S5 Table).

Identification of potentially damaging genes and variants
WES analysis of all samples included in this study revealed a total of 3426 variants. To select a dataset of more biologically relevant variations, filtering was performed using the MutationAssessor results from PCGR. This filtering step reduced the total number of events to 320 potential pathogenic variants. Among these variants, only so far unreported genes mutated in more than one patient are presented (thus, known hotspot genes like ALK were excluded). Based on the filtering described above, 17 biologically relevant variants in nine biologically relevant genes were identified in nine patients: TMEM14B, HLA-DRB1, OR6C68, TTN, FLG, RHBG, SHROOM3, UTRN, and XIRP2. Two genes with identical variants were present in two patients (TMEM14B and HLA-DRB1) with a TAF ranging from 44 to 58%, and seven genes with different variants were identified (Tables 6 and S7). Two patients (7 and 12) with Genes with detected variants in patients 1, 4, 5, 7, 9, and 11, 12, 14, 16,17, 18, and 20 at the time of diagnosis, classified into Tier 3.
Genes with detected variants in patients 1, 6, 7, 9, 17, and 23 at the time of relapse, classified into Tier 3. Samples 7 � and 7 �� are two subsequent relapses in the same patient; -, not applicable https://doi.org/10.1371/journal.pone.0273280.t003 mutations of the OR6C68 gene at 12q13.2 showed copy number gains of the corresponding part of 12q (unpublished data). We observed a tendency to obtain mutated variants more frequently in the primary tumor samples of patient 1 and 12.

Gene ontology analysis
Gene ontology analysis (GOseq) for non-synonymous variants revealed several pathways with potential biological impact on NBL, including neuron cell-cell adhesion, biological adhesion, and PI3K/Akt signaling pathway, immune response-regulating cell surface receptors, or innate immune response activating cell surface receptors (Table 7). In these pathways, the RET, PIK3CA, PHLPP1, KRAS, NFKB1, MUC5B, and MUC6 genes were found to be mutated in at least two patients (Table 5). Further signaling pathways with potential biological significance in NBL and details of the analysis are presented in S5 Table.

Genes reported with different ranking rules
In all of the investigated samples, a total of 2722 out of 20805 genes of the human reference genome 37 (grch37) revealed at least one mutation. To identify cardinal genes of NBL and to evaluate the identified mutated genes, different ranking rules were applied. First, genes were  ranked according to the average number of mutations, resulting in a tendency to rank long genes higher. Next, ranking by average divided by gene length was performed and finally, genes were ranked by the total number of observations with mutations in the gene ( Table 8).
The TTN gene was reported among the top 10 under two different applied ranking rules.
Besides TTN, there were no predominant findings detected except two genes listed (TRIM9, PKHD1) in top 10, previously reported in the context of NBL.

Discussion
Different research groups have used sequencing and CNVs analysis in various efforts to explore the nature of NBL and to find novel strategies for prognostic assessment and  [44]. In our study, we identified novel coding variants of genes possibly contributing to the understanding of these processes. Beside the confirmation of known mutations in ALK, we identified changes in genes of the RET signalling pathway, the RAS-MAPK and p53 signalling pathway, immune response genes, and other previously described NBL-related genes. We also detected specific features of the relapse tumor samples, several over-represented genes, and novel non-synonymous variants of genes occurring in more than one patient sample. Presumably, an increased allelic frequency correlates with the oncogenic potential of identified gene variants [45]. In our study, the average TAF for all detected variants is higher than 48%, supporting the assumption that the detected mutations could influence the NBL progression.

ALK mutations and the RET signalling pathway
ALK activating mutations were identified in five patients, all at hotspot positions, that might be candidates for using ALK-targeted inhibitors [22]. Three of the patients carrying these mutations were above the age of 18 months at the time of diagnosis and one was exactly at the critical age of 18 months. In patient 9, an ALK mutation was found in addition to a mutated variant of the PHLPP1 gene. PHLPP1 is related to the RET signalling pathway and known for the promotion of tumor progression [46]. The RET gene is involved in neural crest development and ontogenesis of the enteric nervous system. Besides, RET is commonly expressed in NBL [47]. The PHLPP1-and additional genes of the RET signalling pathway were detected in six other primary and relapse tumor samples included in the study ( Table 3). All of these patients were 18 months of age or older at time of diagnosis. Additionally, five out of seven patients presented an 11q deletion, one was MNA and for one there was no clinically relevant genomic changes detected. Our analysis detected two different variants of the RET gene in the relapsed samples of patient 1 and patient 17; however, they were classified by MutationAssessor as predicted to be tolerated. Moreover, in six patients GOSeq analysis identified neuron cell-cell adhesion and biological adhesion pathways, both pathways include the RET gene ( Table 7).  [48]. Both of these genes show mutated variants in a relapse sample of patient 6 and primary tumor samples of patient 1 and 12; all three patients present an 11q deletion and were above 18 months at the time of diagnosis. Poor prognosis in NBL patients is also associated with mutations in genes of the p53 signalling pathway, e.g. CREBBP [28]. A mutated variant in the CREBBP gene (c.7126G>A) was detected in the primary tuomr of patient 1, presenting an 11q deletion and relapse disease.

Previously described NBL-related genes
Additional mutations were detected in genes, previously reported to play a potential role in NBL disease development, including TP53, DMD, ROS, LMO3, PRUNE2, and ERBB3, members of the MAPK family (MAP2K4 and MAP2K7) and the ABCA2 gene [48,50,51]. In three patients with MYCN amplification mutated variants of NFKB1, ALK, and SULF2 were detected. There is evidence that SULF2 is over-expressed in MNA NBL cell lines [52]. All three patients were older than 18 months at the time of diagnosis.

Immune response genes
Immunotherapy with the anti-GD2 antibody is an important step of standard treatment protocol for high-risk NBL patients. It is based on inducing immune responses, e.g. by infusing monoclonal antibodies against the tumor-associated disialoganglioside GD2, combined with for example granulocyte-macrophage colony-stimulating factor and interleukin-2 [53]. Yet, 40% of NBL patients relapse [54]. Following GOSeq results, we identified mutations in genes involved in immune response-regulating cell surface receptor signalling pathways, e.g. mucines (MUC5B, MUC6, and MUC16), KRAS, PIK3CA, and NFKB1 genes. All changes were found in the relapse samples, except the KRAS variant. Mutations in immune-response related genes detected in relapse tumor, followed by specific functional studies, could provide an important answer what differs in responders and non-responders of immunotherapy and why it is still not that effective. There is no reported association of genes of the MUC gene family with NBL, but these genes are known to play a role in cancer cells differentiation and proliferation, interacting, and regulating tumor microenvironment [55,56].

Specific features of the relapse tumor samples
The relapsed samples examined in our study exhibited more mutated variants in comparison to the primary tumor (both non-synonymous and synonymous), confirming previous findings [33]. This leads to genetic instability and diversity being a major obstacle in the research on prognostic markers and successful treatment in NBL. Five out of six patients who experienced a relapse presented an 11q deletion and one was MNA. All these patients died from the disease. Genes found mutated in relapse of patient 1, such as LMO3 or RET, are associated with unfavourable outcome of the disease and tumor progression [57,58]. The TAFs of these genes were 83 and 56%, respectively (S7 Table). Based on our findings, the clinical outcome of the patient and the published literature, we speculate that the detected mutations may have rendered these genes to become oncogenic drivers contributing to an unfavourable prognosis and disease development.
Recent studies suggest that AKT is a critical prognostic factor for NBL and that its expression is correlated with poor prognosis of the disease [59]. Additionally, we observed recurrent mutations in DUSP10 in relapse samples of patient 7. DUSP10 is a member of regulators of neuronal cell growth and differentiation [60]. Taking into account that the TAF of DUSP10 amounted to 48%, and the tumor cell infiltration was 80%, indicates the possibilities of a driver oncogene in this relapse.

Specific over-represented mutated genes
The purposes of this study were to discover novel mutations crucial for the origin, progression, or treatment on high-risk NBL. To this end, we asked whether certain mutations were overrepresented in our samples. Despite the relatively low number of patients, we were able to find novel candidates. Due to the high diversity in the mutated genes, different ranking rules were applied in various analytic attempts. Of the top 10 candidates in the various lists, only the TRIM9 and PKHD1 genes have been previously reported in the context of NBL [61,62].

Novel non-synonymous variants of genes occurring in more than one patient
The biological relevance of non-synonymous variants of genes occurring in more than one patient and predicted to cause protein changes were analysed by the mutation effect predictor MutationAssessor algorithm. Here, we describe some interesting candidates detected in this group : TMEM14B, OR6C68, TTN, SHROOM3, and UTRN. There is no record of TMEM14B being linked to NBL and the association of TMEM14B to cancer is not clear [33]; however, other members of the TMEM protein family have been found in some NBL samples. There is no clinical pattern between patients with mutated variant of TMEM14B, but one of two presents 11q deletion together with an age above 18 months at the time of diagnosis.
A possible association of OR6C68 to cancer has so far not been described, but the expression of other members of the odorant receptor family genes is documented in olfactorial NBL [63], a central nerve-derived neoplasm, which does not belong to the family of sympathic peripheral neuroblastic tumors (PNTs). In our study, both patients with a detected variant of OR6C68 have in addition important risk factors as an 11q deletion and age above or equal to 18 months at the time of diagnosis.
Titin (TTN) is one of the longest genes in the human genome and therefore exposed to a higher risk of random mutations. Nevertheless, this pure statistical statement does not exclude the possibility of biologically relevant mutations in this gene [64]. Mutations in TTN have been previously observed in NBL and olfactorial NBL; however, no obvious conclusion of its biological function was described [65][66][67].
We observe mutated variants of SHROOM3 in patients presenting an 11q deletion with additional detected mutations in DMD, PRUNE2, and RHGB genes. There is no established relation between SHROOM3 and NBL, but SHROOM3 has been identified as a strong candidate involved in the pathogenesis of craniofacial microsomia, which is a disease believed to be partially caused by disturbances of neural crest cells during embryogenesis [68]. NBL arise like other PNTs of sympathic origin from neural crest-derived cells. Thus, sequence variations of SHROOM3 might occur, and a possible impact of NBL development is not unlikely. Mutations in SHROOM3 have also been observed in relapsed acute lymphoblastic leukaemia [69].
Different non-synonymous variants for UTRN located on chromosome 6q were detected in two of the patients (1 and 4). We observed different variants of EIF3A gene in the primary tumor of patient 4 and the relapse sample of patient 1. The 6q24 region is commonly deleted in several types of cancers. Mutations in this region have previously been observed in NBL [70]. However, none of our patients with detected variants showed CNVs in that region (unpublished data).
Our findings demonstrate a remarkable divergence in both clinical and molecular characteristics of NBL, highlighting again the registered enormous heterogeneity observed in this disease.
In summary, we were able to confirm an unfavourable effect of mutations in RAS-MAPK, RET, or p53 signalling pathway genes. Moreover, some reported variants correlated with the occurrence of relapses and fatal outcome of the disease. In addition, we detected mutated variants of immune-response genes in the majority of relapse tumor samples. Primary refractory or relapsing disease significantly limits the survival of high-risk NBL patients.
Comparing primary versus relapse tumor samples, only nine shared genes identified in patient 1 may have a driver gene potential for tumor evolution. Concerning the other patients, no such shared genes were classified into Tier 3, suggesting that these genes are more likely passenger-genes, or alternatively, that the relapse is a rare de novo tumor. However, dividing driver-versus passenger-genes is challenging and should be addressed in larger cohorts or functional studies.
Genetic diversity complicates our understanding of treatment failure. Hopefully, our study will complement to the existing knowledge in the field and aid to select genes which in subsequent functional studies might prove to act as potent future biomarkers in NBL.

Ethical statement
The study protocol was approved by the Regional Committee for Medical and Health Research Ethics (REK nr: 2014/2010/REK Sør-Øst C). For all patients, written informed consents have been signed and approved by the patient or the parents of the patient, depending of the age at time of diagnosis (below/above 16 years) (REK nr: 2014/2010/REK Sør-Øst C). The parents made a voluntary and deliberate decision regarding their child's participation in the study, based on what is best for their child, their child's opinion, as well as their own perspectives. All methods were performed in accordance with the relevant guidelines and regulations enacted by the Genomics and Bioinformatics Core Facility, Oslo University Hospital, the South-Eastern Norway Regional Health Authority and the University of Oslo.

Patient material
Primary and relapse tumor samples were collected for diagnostic purpose at the Department of Pathology at Oslo University Hospital Radiumhospitalet, Oslo, Norway. According to the INRG pre-treatment risk classification [9,10] 16 patients included in the study were primarily diagnosed with high-risk NBL, one with intermediate risk, and one with very low risk disease. Patients have been treated individually but all following the HR-NBL1/SIOPEN protocol (ClinicalTrials.gov:NCT01704716) or non-HR NBL protocols when diagnosed with intermediate-or very low risk (ClinicalTrials.gov:NCT01728155) [71][72][73][74][75]. All high-risk cases received high dose treatment. The relapsed patients did not follow defined protocols. Blood or BM from included patients, were used as normal control material for WES analysis.

DNA isolation
For DNA extraction different extraction kits were used, depending on the origin of the material: for fresh frozen primary tumor samples the Qiagen Allprep kit or QIAamp DNA mini kit was applied (Qiagen), for bone marrow cells the QIAamp DNA mini kit was used, and for blood the Qiagen EZ1 DNA blood kit was utilized (all Qiagen). For the majority of the samples, an additional cleaning step was applied using the Genomic DNA Clean & Concentrator TM10 (Zymo Research). DNA was quantified using the Qubit dsDNA HS Assay Kit (Invitrogen).

Whole exome sequencing
The library preparation was performed using the Agilent SureSelect Human All Exon V5 following the default protocol and sequencing was performed on the HiSeq2500 using SBS chemistry V3 and paired end-sequencing (2x100bp).

Data analysis
Variant calling. The raw reads from each sample, in FastQC format, were mapped (lanewise) using BWA MEM to the human reference genome (build b37 with an added decoy contig, obtained from the GATK resource bundle) [76]. Sample-wise sorting and duplicate marking was performed on the initial alignments with Picard tools (http://broadinstitute. github.io/picard). GATK tools were subsequently used for two-step local realignment around indels, with matching samples (i.e., primary tumor and its corresponding normal) being processed together [77]. Each sample's pair-end read information was checked for inconsistencies with Picard, and base-quality recalibration was performed by GATK. Somatic variant calling on the matching paired samples was done by using the intersection of MuTect and Strelka [78,79]. Block substitutions were defined as somatic mutations at consecutive positions, where the variant allelic frequency of each was within 5% of the average allelic frequency of the two variants. GATK tools were used for computing coverage statistics based on the recalibrated alignment files. Details of the variant calling pipeline have been described elsewhere [80].
Variant annotation. Functional annotation of somatic variants were detected using the PCGR [81]. The detected variants were categorized into Tier 1 (strong clinical significance), Tier 2 (potential clinical significance), Tier 3 (uncertain clinical significance), or Tier 4 (other non-synonymous). The VCF files from the variant calling pipeline were compressed and indexed using bgzip and tabix, respectively, as recommended by PCGR. The PCGR script, pcgr.toml, was modified to turn off VCF validation and configured with specific parameters (Peripheral_Nervous_System_Cancer_NOS) for this analysis, as NBL falls inside this main category. The PCGR provided a list of variants in each of our 18 patients. In the PCGR, every somatic variant presented to the program is classified either as Tier 1, Tier 2, Tier 3, Tier 4, or synonymous variant. Tier 1 variants are variants known to be of strong clinical significance for the cancer type specified by the user, in our case NBL. Tier 2 variants are described as variants with potential clinical significance: either strong evidence that the variant has a clinical significance in another cancer type or weak evidence that the variant has clinical significance in the cancer type specified by the user. Tier 1 and Tier 2 variants have to be classified in the CIViC database or the Cancer Biomarkers Database [82,83]. Tier 3 variants are variants of uncertain clinical significance, which are unspecified non-synonymous variants located within a known tumor suppressor gene or proto-oncogene. All other non-synonymous variants are classified into Tier 4.
The PCGR uses mutation effect predictors to estimate the biological effect of the non-synonymous variants. In this study, the mutation effect predictor MutationAssessor was chosen to identify biologically relevant variants [39]. When MutationAssessor predicts a variant with damaging effect, the amino acid change caused by that variant is predicted to cause damage to the protein product that impacts the function of the protein. If the variant is predicted to be tolerated, this means that the resulting amino acid change in the protein is predicted to have no functional impact. In this study, we use this definition for a variant to be: predicted to be damaging or predicted to be tolerated. Variant annotation was performed in this study by using the MutationAssessor with high sensitivity as a variant effect predictor [84]. This choice was necessary to avoid losing data of possible interest. Therefore, there might be some false positives (variants that are not damaging) amongst the variants that we classify as biologically relevant, hence the functional importance of candidate genes and variants should be validated.
The other feature of PCGR utilized in this study is Oncoscore [85]. This is a score between 0 and 1 expressing the frequency of whether a gene has been reported in relation to cancer in the scientific literature. A low score represents a low association, while a high score represents high association. If a variant is classified into Tier 4, Oncoscore can help identify its potential as a target for further investigation.
Statistical analysis. Data analyses after utilizing PCGR, including frequency analysis, statistics, and plotting, were performed using the programming language R in the integrated development environment RStudio. Various statistical algorithms for analysis of sequencing data were tested and evaluated. Genes were ranked based on their average amount of mutations across patients (G), their average amount of mutations normalised by gene length (nG), and the total number of patients with mutations in the genes (B).
Pathway analysis. To identify pathways impacted by the identified mutated variants GOSeq analysis was performed. [86] The analysis was carried out separately for each sample and all genes with at least one somatic mutation (Tier 1-Tier 4) based on the PCGR analysis were included. We looked at mutations in the coding region. The output of this analysis was a ranking of the Gene Ontology (GO) categories according to the number of samples for which the GO category was significant (p<0.05).
Supporting information S1  Table. Functional analysis results. Functional analysis results of the genes with somatic mutations was performed using GOSeq (1). The analysis was carried out separately for 18 primary tumor and 6 relapsed samples, performed based on the coding mutations. All genes with at least one somatic mutation (Tier 1-Tier 4) detected by the PCGR analysis were included. Information about number of samples with detected pathways and number of genes involved in the certain pathway is provided.